Systolic Array
   HOME

TheInfoList



OR:

In
parallel Parallel is a geometric term of location which may refer to: Computing * Parallel algorithm * Parallel computing * Parallel metaheuristic * Parallel (software), a UNIX utility for running programs in parallel * Parallel Sysplex, a cluster of ...
computer architectures In computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the ...
, a systolic array is a homogeneous
network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
of tightly coupled
data processing unit A data processing unit (DPU) is a channel controller, a programmable specialized electronic circuit with hardware acceleration of data processing for data-centric computing. The data is transmitted to and from the component as multiplexed ...
s (DPUs) called cells or
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...
s. Each node or DPU independently computes a partial result as a function of the data received from its upstream neighbours, stores the result within itself and passes it downstream. Systolic arrays were first used in
Colossus Colossus, Colossos, or the plural Colossi or Colossuses, may refer to: Statues * Any exceptionally large statue ** List of tallest statues ** :Colossal statues * ''Colossus of Barletta'', a bronze statue of an unidentified Roman emperor * ''Col ...
, which was an early computer used to break German
Lorenz Lorenz is an originally German name derived from the Roman surname Laurentius, which means "from Laurentum". Given name People with the given name Lorenz include: * Prince Lorenz of Belgium (born 1955), member of the Belgian royal family by h ...
ciphers during
World War II World War II or the Second World War, often abbreviated as WWII or WW2, was a world war that lasted from 1939 to 1945. It involved the vast majority of the world's countries—including all of the great powers—forming two opposin ...
. Due to the classified nature of Colossus, they were independently invented or rediscovered by H. T. Kung and Charles Leiserson who described arrays for many dense linear algebra computations (matrix product, solving systems of
linear equation In mathematics, a linear equation is an equation that may be put in the form a_1x_1+\ldots+a_nx_n+b=0, where x_1,\ldots,x_n are the variables (or unknowns), and b,a_1,\ldots,a_n are the coefficients, which are often real numbers. The coefficien ...
s,
LU decomposition In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix (see matrix decomposition). The product sometimes includes a pe ...
, etc.) for banded matrices. Early applications include computing
greatest common divisor In mathematics, the greatest common divisor (GCD) of two or more integers, which are not all zero, is the largest positive integer that divides each of the integers. For two integers ''x'', ''y'', the greatest common divisor of ''x'' and ''y'' is ...
s of integers and polynomials. They are sometimes classified as multiple-instruction single-data (MISD) architectures under
Flynn's taxonomy Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in design of modern processors and their functionalities. ...
, but this classification is questionable because a strong argument can be made to distinguish systolic arrays from any of Flynn's four categories:
SISD SISD can refer to: * Single instruction, single data, a computer processor architecture * CCL5, an 8kDa protein also using the symbol SISD * Sixteen-segment display * Several school districts in Texas. See List of school districts in Texas - S * S ...
,
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
,
MISD MISD is an acronym that may refer to: * Independent School Districts in Texas - M *Marion Independent School District (Iowa) Marion Independent School District is a public school district in Marion, Iowa. It consists of a high school, a mi ...
,
MIMD In computing, multiple instruction, multiple data (MIMD) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be exe ...
, as discussed later in this article. The parallel input
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
flows through a network of hard-wired
processor Processor may refer to: Computing Hardware * Processor (computing) **Central processing unit (CPU), the hardware within a computer that executes a program *** Microprocessor, a central processing unit contained on a single integrated circuit (I ...
nodes, which combine, process,
merge Merge, merging, or merger may refer to: Concepts * Merge (traffic), the reduction of the number of lanes on a road * Merge (linguistics), a basic syntactic operation in generative syntax in the Minimalist Program * Merger (politics), the comb ...
or
sort Sort may refer to: * Sorting, any process of arranging items in sequence or in sets ** Sorting algorithm, any algorithm for arranging elements in lists ** Sort (Unix), a Unix utility which sorts the lines of a file ** Sort (C++), a function in the ...
the input data into a derived result. Because the
wave In physics, mathematics, and related fields, a wave is a propagating dynamic disturbance (change from equilibrium) of one or more quantities. Waves can be periodic, in which case those quantities oscillate repeatedly about an equilibrium (res ...
-like propagation of data through a systolic array resembles the
pulse In medicine, a pulse represents the tactile arterial palpation of the cardiac cycle (heartbeat) by trained fingertips. The pulse may be palpated in any place that allows an artery to be compressed near the surface of the body, such as at the nec ...
of the human circulatory system, the name ''systolic'' was coined from medical terminology. The name is derived from
systole Systole ( ) is the part of the cardiac cycle during which some chambers of the heart contract after refilling with blood. The term originates, via New Latin, from Ancient Greek (''sustolē''), from (''sustéllein'' 'to contract'; from ''sun ...
as an analogy to the regular pumping of blood by the heart.


Applications

Systolic arrays are often hard-wired for specific operations, such as "multiply and accumulate", to perform massively
parallel Parallel is a geometric term of location which may refer to: Computing * Parallel algorithm * Parallel computing * Parallel metaheuristic * Parallel (software), a UNIX utility for running programs in parallel * Parallel Sysplex, a cluster of ...
integration,
convolution In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions ( and ) that produces a third function (f*g) that expresses how the shape of one is ...
,
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
,
matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
or data sorting tasks. They are also used for
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. I ...
algorithms, used in DNA and protein
sequence analysis In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alignm ...
.


Architecture

A systolic array typically consists of a large
monolithic A monolith is a monument or natural feature consisting of a single massive stone or rock. Monolith or monolithic may also refer to: Architecture * Monolithic architecture, a style of construction in which a building is carved, cast or excavated ...
network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
of primitive computing
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...
s which can be hardwired or software configured for a specific application. The nodes are usually fixed and identical, while the interconnect is programmable. The more general wave front processors, by contrast, employ sophisticated and individually programmable nodes which may or may not be monolithic, depending on the array size and design parameters. The other distinction is that systolic arrays rely on
synchronous Synchronization is the coordination of events to operate a system in unison. For example, the conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are said to be synchronou ...
data transfers, while
wavefront In physics, the wavefront of a time-varying ''wave field'' is the set (locus) of all points having the same ''phase''. The term is generally meaningful only for fields that, at each point, vary sinusoidally in time with a single temporal freque ...
tend to work
asynchronous Asynchrony is the state of not being in synchronization. Asynchrony or asynchronous may refer to: Electronics and computing * Asynchrony (computer programming), the occurrence of events independent of the main program flow, and ways to deal with ...
ly. Unlike the more common
Von Neumann architecture The von Neumann architecture — also known as the von Neumann model or Princeton architecture — is a computer architecture based on a 1945 description by John von Neumann, and by others, in the ''First Draft of a Report on the EDVAC''. The ...
, where program execution follows a script of instructions stored in common memory, addressed and sequenced under the control of the CPU's
program counter The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
(PC), the individual nodes within a systolic array are triggered by the arrival of new data and always process the data in exactly the same way. The actual processing within each node may be hard wired or block micro coded, in which case the common node personality can be block programmable. The systolic array paradigm with data-streams driven by data
counter Counter may refer to: Mathematics and computing * Counter machine, a subclass of register machines * Counter (digital), an electronic device, mechanical device, or computer program for counting * Loop counter, the variable that controls the iter ...
s, is the counterpart of the Von Neumann architecture with instruction-stream driven by a program counter. Because a systolic array usually sends and receives multiple data streams, and multiple data counters are needed to generate these data streams, it supports
data parallelism Data parallelism is parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like ...
.


Goals and benefits

A major benefit of systolic arrays is that all operand data and partial results are stored within (passing through) the processor array. There is no need to access external buses, main memory or internal caches during each operation as is the case with Von Neumann or
Harvard Harvard University is a private Ivy League research university in Cambridge, Massachusetts. Founded in 1636 as Harvard College and named for its first benefactor, the Puritan clergyman John Harvard, it is the oldest institution of higher le ...
sequential machines. The sequential limits on
parallel Parallel is a geometric term of location which may refer to: Computing * Parallel algorithm * Parallel computing * Parallel metaheuristic * Parallel (software), a UNIX utility for running programs in parallel * Parallel Sysplex, a cluster of ...
performance dictated by
Amdahl's Law In computer architecture, Amdahl's law (or Amdahl's argument) is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It states that ...
also do not apply in the same way, because data dependencies are implicitly handled by the programmable
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...
interconnect and there are no sequential steps in managing the highly parallel data flow. Systolic arrays are therefore extremely good at artificial intelligence, image processing, pattern recognition, computer vision and other tasks which animal brains do so particularly well. Wavefront processors in general can also be very good at machine learning by implementing self configuring neural nets in hardware.


Classification controversy

While systolic arrays are officially classified as
MISD MISD is an acronym that may refer to: * Independent School Districts in Texas - M *Marion Independent School District (Iowa) Marion Independent School District is a public school district in Marion, Iowa. It consists of a high school, a mi ...
, their classification is somewhat problematic. Because the input is typically a vector of independent values, the systolic array is definitely not
SISD SISD can refer to: * Single instruction, single data, a computer processor architecture * CCL5, an 8kDa protein also using the symbol SISD * Sixteen-segment display * Several school districts in Texas. See List of school districts in Texas - S * S ...
. Since these input values are merged and combined into the result(s) and do not maintain their
independence Independence is a condition of a person, nation, country, or state in which residents and population, or some portion thereof, exercise self-government, and usually sovereignty, over its territory. The opposite of independence is the statu ...
as they would in a
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
vector processing unit, the
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
cannot be classified as such. Consequently, the array cannot be classified as a
MIMD In computing, multiple instruction, multiple data (MIMD) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be exe ...
either, because MIMD can be viewed as a mere collection of smaller SISD and
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
machines. Finally, because the data
swarm Swarm behaviour, or swarming, is a collective behaviour exhibited by entities, particularly animals, of similar size which aggregate together, perhaps milling about the same spot or perhaps moving ''en masse'' or migrating in some direction. ...
is transformed as it passes through the array from
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...
to node, the multiple nodes are not operating on the same data, which makes the MISD classification a
misnomer A misnomer is a name that is incorrectly or unsuitably applied. Misnomers often arise because something was named long before its correct nature was known, or because an earlier form of something has been replaced by a later form to which the name ...
. The other reason why a systolic array should not qualify as a MISD is the same as the one which disqualifies it from the SISD category: The input data is typically a vector not a single data value, although one could argue that any given input vector is a single item of data. In spite of all of the above, systolic arrays are often offered as a classic example of MISD architecture in textbooks on
parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
and in engineering classes. If the array is viewed from the outside as atomic it should perhaps be classified as SFMuDMeR = Single Function, Multiple Data, Merged Result(s). Systolic arrays use a pre-defined computational flow graph that connects their nodes.
Kahn process networks A Kahn process network (KPN, or process network) is a distributed ''model of computation'' in which a group of deterministic sequential processes communicate through unbounded first in, first out channels. The model requires that reading from a c ...
use a similar flow graph, but are distinguished by the nodes working in lock-step in the systolic array: in a Kahn network, there are FIFO queues between each node.


Detailed description

A systolic array is composed of matrix-like rows of
data processing unit A data processing unit (DPU) is a channel controller, a programmable specialized electronic circuit with hardware acceleration of data processing for data-centric computing. The data is transmitted to and from the component as multiplexed ...
s called cells. Data processing units (DPUs) are similar to
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
s (CPUs), (except for the usual lack of a
program counter The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
, since operation is transport-triggered, i.e., by the arrival of a data object). Each cell shares the information with its neighbors immediately after processing. The systolic array is often rectangular where data flows across the array between neighbour DPUs, often with different data flowing in different directions. The data streams entering and leaving the ports of the array are generated by auto-sequencing memory units, ASMs. Each ASM includes a data
counter Counter may refer to: Mathematics and computing * Counter machine, a subclass of register machines * Counter (digital), an electronic device, mechanical device, or computer program for counting * Loop counter, the variable that controls the iter ...
. In
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...
s a data stream may also be input from and/or output to an external source. An example of a systolic
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
might be designed for
matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
. One
matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** ''The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchis ...
is fed in a row at a time from the top of the array and is passed down the array, the other matrix is fed in a column at a time from the left hand side of the array and passes from left to right. Dummy values are then passed in until each processor has seen one whole row and one whole column. At this point, the result of the multiplication is stored in the array and can now be output a row or a column at a time, flowing down or across the array. Systolic arrays are arrays of DPUs which are connected to a small number of nearest neighbour DPUs in a mesh-like topology. DPUs perform a sequence of operations on data that flows between them. Because the traditional systolic array synthesis methods have been practiced by algebraic algorithms, only uniform arrays with only linear pipes can be obtained, so that the architectures are the same in all DPUs. The consequence is, that only applications with regular data dependencies can be implemented on classical systolic arrays. Like
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
machines, clocked systolic arrays compute in "lock-step" with each processor undertaking alternate compute , communicate phases. But systolic arrays with asynchronous handshake between DPUs are called ''wavefront arrays''. One well-known systolic array is Carnegie Mellon University's
iWarp iWARP is a computer networking protocol that implements remote direct memory access (RDMA) for efficient data transfer over Internet Protocol networks. Contrary to some accounts, iWARP is not an acronym. Because iWARP is layered on Internet Eng ...
processor, which has been manufactured by Intel. An iWarp system has a linear array processor connected by data buses going in both directions.


History

Systolic arrays (also known as ''wavefront processors''), were first described by H. T. Kung and
Charles E. Leiserson Charles Eric Leiserson is a computer scientist, specializing in the theory of parallel computing and distributed computing, and particularly practical applications thereof. As part of this effort, he developed the Cilk multithreaded language. ...
, who published the first paper describing systolic arrays in 1979. However, the first machine known to have used a similar technique was the Colossus Mark II in 1944.


Application example

;Polynomial evaluation
Horner's rule In mathematics and computer science, Horner's method (or Horner's scheme) is an algorithm for polynomial evaluation. Although named after William George Horner, this method is much older, as it has been attributed to Joseph-Louis Lagrange by Hor ...
for evaluating a polynomial is: : y = ( \ldots ( ( (a_n \cdot x + a_) \cdot x + a_) \cdot x + a_) \cdot x + \ldots + a_1) \cdot x + a_0. A linear systolic array in which the processors are arranged in pairs: one multiplies its input by x and passes the result to the right, the next adds a_j and passes the result to the right.


Advantages and disadvantages

Pros * Faster than general purpose processors * Scalable Cons * Expensive, due to low economy of scale * Highly specialized, custom hardware is required and often application specific. * Not widely implemented * Limited code base of programs and algorithms. (Not all algorithms can be implemented as systolic arrays. Often tricks are needed to map such algorithms on to a systolic array.)


Implementations

*
Cisco Cisco Systems, Inc., commonly known as Cisco, is an American-based multinational digital communications technology conglomerate corporation headquartered in San Jose, California. Cisco develops, manufactures, and sells networking hardware, ...
PXF network processor is internally organized as systolic array. * Google’s TPU is also designed around a systolic array. * Paracel FDF4T TestFinder text search system * Paracel FDF4G GeneMatcher Biological (DNA and Protein) search system * Inferentia chip at
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
*
MIT Eyeriss The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the mo ...
is a systolic array accelerator for convolutional neural networks.


See also

*
MISD MISD is an acronym that may refer to: * Independent School Districts in Texas - M *Marion Independent School District (Iowa) Marion Independent School District is a public school district in Marion, Iowa. It consists of a high school, a mi ...
– Multiple Instruction Single Data, Example: Systolic Arrays *
iWarp iWARP is a computer networking protocol that implements remote direct memory access (RDMA) for efficient data transfer over Internet Protocol networks. Contrary to some accounts, iWARP is not an acronym. Because iWARP is layered on Internet Eng ...
– Systolic Array Computer, VLSI, Intel/CMU * WARP (systolic array) – Systolic Array Computer, GE/CMU *
Tensor Processing Unit Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for Artificial neural network, neural network machine learning, using Google's own TensorFlow software. Google began using TPUs ...
AI accelerator
ASIC An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficien ...


Notes


References

* H. T. Kung, C. E. Leiserson: Algorithms for VLSI processor arrays; in: C. Mead, L. Conway (eds.): Introduction to VLSI Systems; Addison-Wesley, 1979 * S. Y. Kung: VLSI Array Processors; Prentice-Hall, Inc., 1988 * N. Petkov: Systolic Parallel Processing; North Holland Publishing Co, 1992


External links


''Instruction Systolic Array (ISA)''

'A VLSI Architecture for Image Registration in Real Time' (Based on systolic array), Vol. 15, September 2007
{{DEFAULTSORT:Systolic Array Parallel computing Reconfigurable computing